Skip to content

fix(portForwarding): serialize concurrent PF rule creation per VIP to prevent duplicates#3345

Open
zstack-robot-1 wants to merge 15 commits into5.5.6from
sync/ye.zou/fix/ZSTAC-77673
Open

fix(portForwarding): serialize concurrent PF rule creation per VIP to prevent duplicates#3345
zstack-robot-1 wants to merge 15 commits into5.5.6from
sync/ye.zou/fix/ZSTAC-77673

Conversation

@zstack-robot-1
Copy link
Collaborator

Problem\nConcurrent APICreatePortForwardingRuleMsg requests for the same VIP could create duplicate port forwarding rules with overlapping port ranges. The interceptor checks for port overlap before the handler, but without synchronization, two concurrent requests can both pass the check and persist duplicate rules.\n\nResolves: ZSTAC-77673\n\n## Root Cause\nThe handle(APICreatePortForwardingRuleMsg) method persists the PortForwardingRuleVO without any per-VIP synchronization. The existing PortForwardingApiInterceptor checks for VIP port overlap, but the window between the interceptor check and handler persist() allows race conditions.\n\n## Fix\n1. Wrap the CREATE handler in thdf.chainSubmit(new ChainTask) with sync signature portforwardingrule-vip-{vipUuid} — same pattern as the DELETE handler\n2. Re-check VIP port overlap inside the synchronized ChainTask before persist\n3. Add chain.next() at all async exit points (VIP acquire callbacks + FlowChain done/error)\n\nThis serializes all port forwarding rule creation per VIP, eliminating the race condition window.\n\n## Test Plan\n- Verify creating port forwarding rules normally still works\n- Verify concurrent creation of rules with overlapping ports on the same VIP returns error for the second request\n- Verify rules on different VIPs can still be created concurrently

sync from gitlab !9174

…isk offering

Resolves: ZSTAC-74683

Change-Id: Id0339ed0221e92e506f60745cde972cc3ee6d9ae
When anti-split-brain check selects a disconnected MDS node, the HTTP
call now times out after 30s instead of 5+ minutes, and automatically
retries the next available MDS via tryNext mechanism.

Resolves: ZSTAC-80595

Change-Id: I1be80f1b70cad1606eb38d1f0078c8f2781e6941
When MN restarts during a destroy operation, the hypervisor may report
the VM as Stopped. Without this transition, the state machine throws
an exception and the VM stays stuck in Destroying state forever.

Resolves: ZSTAC-80620

Change-Id: I037edba70d145a44a88ce0d3573089182fedb162
…pacity

Resolves: ZSTAC-79709

Change-Id: I45a2133bbb8c51c25ae3549d59e588976192a08d
@coderabbitai
Copy link

coderabbitai bot commented Feb 13, 2026

Walkthrough

对若干子系统进行了小范围逻辑调整:端口转发创建改为基于 ChainTask 的同步链式处理并增加重叠检查;VM NIC、镜像根盘大小和实例状态、L3 网络空闲 IP 过滤、Ceph 容量释放策略及 ZBS HTTP 调用重试行为均做了局部修改。

Changes

Cohort / File(s) Summary
端口转发管理器
plugin/portForwarding/src/main/java/org/zstack/network/service/portforwarding/PortForwardingManagerImpl.java
将 APICreatePortForwardingRuleMsg 处理迁移到基于 VIP UUID 的 ChainTask 同步流;在同步路径添加同 VIP/同协议端口范围重叠检查并在冲突时发布错误并提前 chain.next();延后 VIP 获取,重构 flow/chain 调用与任务命名。
计算/网卡防护
compute/src/main/java/org/zstack/compute/vm/VmNicManagerImpl.java
在 afterAddIpAddress 和 afterDelIpAddress 中对 VmNicVO 增加 null-guard,找不到 NIC 时记录调试并提前返回,避免后续假定 NIC 存在的操作。
镜像/实例规格与状态
header/src/main/java/org/zstack/header/vm/VmInstanceSpec.java, header/src/main/java/org/zstack/header/vm/VmInstanceState.java
当 rootDiskOffering 为 null 时,getRootDiskAllocateSize 使用 image.virtualSize 与 image.actualSize 的最大值;在 VmInstanceState 的 Destroying 状态添加对 stopped 事件的转移到 Stopped 状态。
L3 网络地址分配
network/src/main/java/org/zstack/network/l3/L3BasicNetwork.java
APIGetFreeIpMsg 返回的 free IP 列表中排除了位于 L3 网络保留 IP 段(reservedIpRanges)内的地址,减少可分配池。
Ceph 主存储容量释放
plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
删除快照成功路径中调用由 osdHelper.releaseAvailableCapacity 改为 osdHelper.releaseAvailableCapWithRatio,改变容量释放为按比率释放的行为。
ZBS 存储 HTTP 调用与重试
plugin/zbs/src/main/java/org/zstack/storage/zbs/ZbsStorageController.java
将对 GetVolumeClients 的同步 HTTP 调用替换为新的 HttpCaller 泛型调用并启用 tryNext(尝试下一个 MDS);新增 HttpCaller.setTryNext(boolean) 链式设置方法以支持多 MDS 重试语义。

序列图

sequenceDiagram
    actor Client
    participant PortForwardingManager
    participant ChainTask
    participant Database
    participant VIPService

    Client->>PortForwardingManager: APICreatePortForwardingRuleMsg
    PortForwardingManager->>ChainTask: 提交 ChainTask(key: vipUuid)
    ChainTask->>Database: 查询同 VIP、同协议的现有规则
    Database-->>ChainTask: 返回规则列表
    alt 存在端口范围重叠
        ChainTask->>PortForwardingManager: 发布错误事件
        ChainTask-->>Client: 返回错误响应
    else 无重叠
        ChainTask->>VIPService: 获取 VIP 详情
        VIPService-->>ChainTask: VIP 信息
        ChainTask->>Database: 持久化端口转发规则
        Database-->>ChainTask: 持久化成功
        ChainTask->>PortForwardingManager: 发布成功事件
        ChainTask-->>Client: 返回创建结果
    end
Loading

预估代码审查工作量

🎯 4 (复杂) | ⏱️ ~45 分钟

诗歌

🐰 链上跳跃步步忙,
VIP 前先把门藏,
重叠端口早发现,
异步同步舞翩跹,
转发规则更安康!


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 1 warning)

Check name Status Explanation Resolution
Title check ❌ Error PR标题超过72个字符限制(88个字符),且格式符合要求。 将标题缩短至72个字符以内,例如:'fix(portForwarding): serialize PF creation per VIP to prevent duplicates'
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description check ✅ Passed PR描述清晰详细地说明了问题、根本原因、修复方案和测试计划,与变更内容高度相关。
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into 5.5.6
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch sync/ye.zou/fix/ZSTAC-77673

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ast-grep (0.40.5)
plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In
`@plugin/portForwarding/src/main/java/org/zstack/network/service/portforwarding/PortForwardingManagerImpl.java`:
- Around line 740-741: The FlowChain name in PortForwardingManagerImpl is
misspelled as "create-portforwading"; update the string passed to
flowChain.setName(...) (the FlowChain created via
FlowChainBuilder.newShareFlowChain() and stored in variable flowChain) to the
correct spelling "create-portforwarding" so the chain name is accurate.

Comment on lines +740 to +741
FlowChain flowChain = FlowChainBuilder.newShareFlowChain();
flowChain.setName("create-portforwading");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

FlowChain 名称存在拼写错误。

"create-portforwading" 应为 "create-portforwarding"(缺少字母 'r')。

🐛 建议修复
 FlowChain flowChain = FlowChainBuilder.newShareFlowChain();
-flowChain.setName("create-portforwading");
+flowChain.setName("create-portforwarding");
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
FlowChain flowChain = FlowChainBuilder.newShareFlowChain();
flowChain.setName("create-portforwading");
FlowChain flowChain = FlowChainBuilder.newShareFlowChain();
flowChain.setName("create-portforwarding");
🤖 Prompt for AI Agents
In
`@plugin/portForwarding/src/main/java/org/zstack/network/service/portforwarding/PortForwardingManagerImpl.java`
around lines 740 - 741, The FlowChain name in PortForwardingManagerImpl is
misspelled as "create-portforwading"; update the string passed to
flowChain.setName(...) (the FlowChain created via
FlowChainBuilder.newShareFlowChain() and stored in variable flowChain) to the
correct spelling "create-portforwarding" so the chain name is accurate.

AlanJager and others added 11 commits February 13, 2026 13:32
Resolves: ZSTAC-82153

Change-Id: Ib51c2e21553277416d1a9444be55aca2aa4b2fc4
…fterAddIpAddress to prevent NPE during rollback

Resolves: ZSTAC-81741

Change-Id: I53bcf20a10306afc7b6172da294d347b74e6c41f
Resolves: ZSTAC-81182

Change-Id: Id1bb642154dc66ae9995dcc4d9fc00cdce9bcaf8
<fix>[vm]: add Destroying->Stopped state transition

See merge request zstackio/zstack!9156
<fix>[i18n]: improve snapshot error message for unattached volume

See merge request zstackio/zstack!9192
<fix>[zbs]: reduce mds connect timeout and enable tryNext for volume clients

See merge request zstackio/zstack!9153
<fix>[vm]: use max of virtual and actual size for root disk allocation

See merge request zstackio/zstack!9155
<fix>[network]: filter reserved IPs in getFreeIp

See merge request zstackio/zstack!9170
<fix>[ceph]: apply over-provisioning ratio when releasing snapshot capacity

See merge request zstackio/zstack!9162
<fix>[network]: add null check for L3 network system tags in API interceptor

See merge request zstackio/zstack!9169
Resolves: ZSTAC-77673

Change-Id: I7fa4cd853dddb1f9a7b7d90bd3731d2559ba6542
@MatheMatrix MatheMatrix force-pushed the sync/ye.zou/fix/ZSTAC-77673 branch from 719563c to 252bdc7 Compare February 16, 2026 07:46
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
compute/src/main/java/org/zstack/compute/vm/VmNicManagerImpl.java (1)

56-64: ⚠️ Potential issue | 🟠 Major

UsedIpVO 在 NIC 存在性校验之前已被更新,可能导致数据不一致

Line 58 的 SQL 更新将 UsedIpVO.vmNicUuid 设置为传入的 vmNicUUid,但 NIC 存在性检查在 Lines 61-64 才执行。如果 NIC 不存在,UsedIpVO 已经关联到一个无效的 vmNicUuid,造成数据库状态不一致。

建议将 null 检查移到 SQL 更新之前:

🐛 建议的修复方案
 `@Override`
 public void afterAddIpAddress(String vmNicUUid, String usedIpUuid) {
-    /* update UsedIpVO */
-    SQL.New(UsedIpVO.class).eq(UsedIpVO_.uuid, usedIpUuid).set(UsedIpVO_.vmNicUuid, vmNicUUid).update();
-
     VmNicVO nic = Q.New(VmNicVO.class).eq(VmNicVO_.uuid, vmNicUUid).find();
     if (nic == null) {
         logger.debug(String.format("VmNic[uuid:%s] not found, skip afterAddIpAddress", vmNicUUid));
         return;
     }
+
+    /* update UsedIpVO */
+    SQL.New(UsedIpVO.class).eq(UsedIpVO_.uuid, usedIpUuid).set(UsedIpVO_.vmNicUuid, vmNicUUid).update();

     UsedIpVO temp = null;
🤖 Fix all issues with AI agents
In `@network/src/main/java/org/zstack/network/l3/L3BasicNetwork.java`:
- Around line 1078-1083: The removal logic may call NetworkUtils.isInRange with
mismatched IP versions (IPv4 vs IPv6) causing exceptions; update the block that
iterates self.getReservedIpRanges() and freeIpInventorys (the lambda using
freeIp and reservedIpRanges.stream) to first filter reserved ranges by the same
IP version as freeIp (use whatever helper/method the file uses elsewhere for
version checking, consistent with the pattern around lines ~486-487), then call
NetworkUtils.isInRange only for ranges with matching versions so ipv4/ipv6
conversions are not mixed and exceptions are avoided.

Comment on lines +1078 to +1083

Set<ReservedIpRangeVO> reservedIpRanges = self.getReservedIpRanges();
if (reservedIpRanges != null && !reservedIpRanges.isEmpty()) {
freeIpInventorys.removeIf(freeIp -> reservedIpRanges.stream().anyMatch(
r -> NetworkUtils.isInRange(freeIp.getIp(), r.getStartIp(), r.getEndIp())));
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

IP 版本不匹配可能导致异常

当前代码在比较 free IP 与 reserved IP range 时,未按 IP 版本过滤。若 free IP 是 IPv4 而 reserved range 是 IPv6(或相反),NetworkUtils.isInRange 内部会调用错误的转换方法(如 ipv4StringToLong 处理 IPv6 地址),可能抛出异常。

参考同文件第 486-487 行的已有模式,应先按 IP 版本过滤 reserved ranges:

🔧 建议修复
         Set<ReservedIpRangeVO> reservedIpRanges = self.getReservedIpRanges();
         if (reservedIpRanges != null && !reservedIpRanges.isEmpty()) {
-            freeIpInventorys.removeIf(freeIp -> reservedIpRanges.stream().anyMatch(
-                    r -> NetworkUtils.isInRange(freeIp.getIp(), r.getStartIp(), r.getEndIp())));
+            freeIpInventorys.removeIf(freeIp -> reservedIpRanges.stream()
+                    .filter(r -> r.getIpVersion() == freeIp.getIpVersion())
+                    .anyMatch(r -> NetworkUtils.isInRange(freeIp.getIp(), r.getStartIp(), r.getEndIp())));
         }
🤖 Prompt for AI Agents
In `@network/src/main/java/org/zstack/network/l3/L3BasicNetwork.java` around lines
1078 - 1083, The removal logic may call NetworkUtils.isInRange with mismatched
IP versions (IPv4 vs IPv6) causing exceptions; update the block that iterates
self.getReservedIpRanges() and freeIpInventorys (the lambda using freeIp and
reservedIpRanges.stream) to first filter reserved ranges by the same IP version
as freeIp (use whatever helper/method the file uses elsewhere for version
checking, consistent with the pattern around lines ~486-487), then call
NetworkUtils.isInRange only for ranges with matching versions so ipv4/ipv6
conversions are not mixed and exceptions are avoided.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants